Model Selection

Image-text matching

# Image-text matching

Vision-language model fine-tuned based on CLIP-ViT-B/32, suitable for image-text matching tasks

MEXMA-SigLIP2 is a high-performance CLIP model combining the MEXMA multilingual text encoder and SigLIP2 image encoder, supporting 80 languages.

Text-to-Image Supports Multiple Languages

Clip Vit Tiny Random Patch14 336

This is a small CLIP model for debugging purposes, based on the ViT architecture with randomly initialized weights.

Longclip GmP ViT L 14

A CLIP model fine-tuned based on BeichenZhang/LongCLIP-L, supporting long-text input (248 tokens) with performance enhanced by Geometric parameterization (GmP) technology

Resnet101 Clip.openai

A CLIP model based on ResNet101 architecture, supporting zero-shot image classification tasks.

Image Classification

Resnet50 Clip.openai

Zero-shot image classification model based on ResNet50 architecture and CLIP technology

Image Classification

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase